perf: reduce HashMap/collection allocation overhead in gateway path#48662
Draft
xinlian12 wants to merge 15 commits intoAzure:mainfrom
Draft
perf: reduce HashMap/collection allocation overhead in gateway path#48662xinlian12 wants to merge 15 commits intoAzure:mainfrom
xinlian12 wants to merge 15 commits intoAzure:mainfrom
Conversation
Eliminate per-response intermediate HashMap allocation by adding a new StoreResponse constructor that accepts HttpHeaders directly. Header names and values are populated into String[] arrays without materializing an intermediate Map. The JsonNodeStorePayload is updated to accept header arrays and only builds a Map lazily on error paths (extremely rare). Pre-size HashMaps throughout the hot path to avoid resize/rehash: - HttpHeaders request construction: sized to defaultHeaders + request headers - StoreResponse.replicaStatusList: pre-sized to 4 - StoreResponse.withRemappedStatusCode: pre-sized to header count - RxDocumentServiceRequest fallback maps: pre-sized to 32 Fix HttpUtils.asMap() double-allocation by iterating HttpHeaders directly instead of calling toMap() which creates an intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alzimmermsft
reviewed
Apr 1, 2026
| @@ -51,14 +52,14 @@ public static String urlDecode(String url) { | |||
|
|
|||
| public static Map<String, String> asMap(HttpHeaders headers) { | |||
| if (headers == null) { | |||
| return new HashMap<>(); | |||
| return new HashMap<>(4); | |||
| } | |||
| HashMap<String, String> map = new HashMap<>(headers.size()); | |||
Member
There was a problem hiding this comment.
You also should make this instantiation
Suggested change
| HashMap<String, String> map = new HashMap<>(headers.size()); | |
| HashMap<String, String> map = new HashMap<>(((int) headers.size() / 0.75F) + 1); |
As internally, HashMap will resize once it hits a capacity factor of 0.75. Meaning this conversion has a map resize happening.
Member
Author
There was a problem hiding this comment.
good catch, yea will change, thanks ~~
Member
There was a problem hiding this comment.
Depending on how many locations start doing this, may want to add in a helper method for this.
…ve null-guard inconsistency - Fix HashMap<>(4) to HashMap<>(6) for replicaStatusList to avoid rehash at 4 replicas (capacity 4 * 0.75 = threshold 3, resizes on 4th insert) - Refactor JsonNodeStorePayload: extract shared parseJson() method with Supplier<Map<String,String>> to eliminate duplicated error-handling logic - Remove misleading null ternary in getHttpRequestHeaders() since getHeaders() always returns non-null (fallback HashMap<>(32)) - Revert HashMap<>(16) to HashMap<>() in HttpHeaders default constructor (16 is already the default capacity, change was no-op noise) - Add unit tests for StoreResponse HttpHeaders constructor, HttpHeaders populateLowerCaseHeaders, and JsonNodeStorePayload array-header constructor Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix HashMap<>(headers.size()) in HttpUtils.asMap() to account for the 0.75 load factor, avoiding resize when all headers are inserted. Extract mapCapacityForSize(int) helper in HttpUtils to consolidate the capacity calculation (n * 4 / 3 + 1) used across HttpUtils.asMap(), StoreResponse.withRemappedStatusCode(), and JsonNodeStorePayload.buildHeaderMap(). Addresses review feedback from alzimmermsft. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xinlian12
commented
Apr 5, 2026
...mos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java
Show resolved
Hide resolved
xinlian12
commented
Apr 5, 2026
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxGatewayStoreModel.java
Outdated
Show resolved
Hide resolved
xinlian12
commented
Apr 5, 2026
...mos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java
Show resolved
Hide resolved
xinlian12
commented
Apr 5, 2026
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/HttpHeaders.java
Show resolved
Hide resolved
…ize to HttpHeaders, clarify lowercase key guarantee - Add comments explaining why HashMap<>(32) is used as fallback in RxDocumentServiceRequest: capacity 32 gives threshold 24, covering typical 15-20 request headers without resize. - Apply HttpUtils.mapCapacityForSize() in RxGatewayStoreModel.getHttpRequestHeaders() to account for 0.75 load factor when constructing HttpHeaders. - Make mapCapacityForSize() public so it can be used from other packages. - Document in populateLowerCaseHeaders() Javadoc that keys are guaranteed lowercase because HttpHeaders.set() stores them via toLowerCase(Locale.ROOT). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xinlian12
commented
Apr 6, 2026
...e-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResponse.java
Show resolved
Hide resolved
Extract duplicate contentStream/payload handling from both StoreResponse constructors into a shared parseResponsePayload() static helper method. Both constructors now use the array-based JsonNodeStorePayload constructor, eliminating code duplication while preserving identical behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove both unescape(Set<Entry>) and unescape(Map) overloads from HttpUtils as they are no longer needed - Update ResponseUtils to use the HttpHeaders-based StoreResponse constructor (same optimization as RxGatewayStoreModel) - Remove unescape test from HttpUtilsTest, keep asMap() coverage - Clean up unused imports (AbstractMap, ArrayList, List, Set) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace previous benchmark charts with comprehensive V3 analysis: - 20 configs (c1/c8/c16/c32/c128 x Read/Write x HTTP1/HTTP2) - 3 rounds each, 10 min/run, GATEWAY mode - Timeline charts with throughput and P99 latency - JFR allocation breakdown comparison - Detailed per-round analysis of outlier patterns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JFR ObjectAllocationSample weight = estimated cumulative bytes allocated over the recording (10 min), not heap residency. Heap was 8 GB committed. The ~271 GB 'targeted' is allocation throughput (~4 GB/s alloc rate), most objects immediately GC'd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace confusing cumulative GB with allocation share % - Add GC comparison table (817 vs 813 pauses - identical) - Frame as code efficiency improvement, not GC impact Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arison - Remove timeline charts (not needed for review) - Add variance analysis using request-level metrics (transitTime) showing variance is server-side, not SDK-related - Add JFR allocation comparison for all 20 configs - Keep summary bar chart and c128 JFR chart Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The c16 PR showing HashMap\ is from the request-sending path (ReactorNettyClient.bodySendDelegate iterating request headers), NOT the response-side asMap() iterator we eliminated. Added clarifying note and removed the per-config ValueIterator column (too noisy). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance: Reduce HashMap/Collection Allocation Overhead in Gateway Path
Motivation
JFR profiling of the baseline (
main) under high-concurrency gateway workloads revealed thatHashMap-related allocations (HashMap$Node,HashMap,HashMap$ValueIterator) and HTTP header collections (DefaultHeaders$HeaderEntry,HttpHeader) are responsible for a significant share of total object allocation churn.Baseline JFR allocation profile (c128 Read HTTP/1,
ObjectAllocationSample, 10-min recording):HashMap$NodeDefaultHeaders$HeaderEntryHashMap$ValueIteratorHttpHeaderHashMapHttpHeadersHashMap$Node[]Root causes:
HashMap<>()default initial capacity (16) forces 1-2 resize+rehash cycles for typical gateway responses with 20-30 headers, creating throwawayHashMap$Node[]arrays and re-hashedHashMap$NodeentriesStoreResponseconstructor convertsHttpHeaderstoMapviaHttpUtils.asMap()on every response, allocating a throwawayHashMap$ValueIteratorand rebuilding allHashMap$NodeentriesHttpHeadersinRxGatewayStoreModel.getHttpRequestHeaders()is undersized, causing internal HashMap resizetoLowerCase()calls on header keys that are already normalizedChanges
HashMap<>(32)instead ofHashMap<>()inRxDocumentServiceRequest, andmapCapacityForSize()helper inHttpUtilsto avoid rehashingStoreResponsenow acceptsHttpHeadersdirectly, removing intermediateasMap()conversion that created throwawayHashMap$ValueIteratorandHashMap$NodearraysRxGatewayStoreModel: sized todefaultHeaders.size() + headers.size()to avoid internal HashMap resizetoLowerCase()calls:HttpHeaders.set()already normalizes keys; callers no longer double-normalize creating extraStringobjectsBenchmark Results
Test matrix: 1 tenant x {c1, c8, c16, c32, c128} concurrency x {Read, Write} x {HTTP/1, HTTP/2} x 3 rounds each, GATEWAY mode, 10 min/run.
Throughput Summary (ops/s, 3-round average +/- stddev)
Variance Analysis
The apparent -4% to -6% deltas at mid-concurrency (c16/c32) are not SDK regressions they are caused by server-side transit time variability between rounds.
A dedicated 6-round reproducibility study (1t-c32-ReadThroughput-http1) with request-level metrics enabled confirms this:
The request-level breakdown shows the variance lives entirely in
transitTime(server round-trip), not in SDK-side processing:hashmap-alloc has 4x lower CV (1.8% vs 7.2%).
GC Comparison (c128 Read HTTP/1, r1)
GC behavior is identical between branches. At single-tenant scale with an 8 GB heap, the allocation reduction does not materially change GC frequency or pause time. The benefit is reduced unnecessary work (fewer resize/rehash cycles, fewer throwaway iterators) which would compound at higher tenant density.
JFR Allocation Comparison All Configs
ObjectAllocationSamplecomparison for aggregate allocation share of all 9 targeted classes.Detailed breakdown for c128 Read HTTP/1 (highest pressure, most stable JFR signal):
HashMap$NodeHashMap$ValueIteratorDefaultHeaders$HeaderEntryDefaultHeadersImplHttpHeaderSummary Chart
30-Tenant Benchmark Results
Test matrix: 30 tenants x {c3, c5} concurrency-per-tenant x {Read, Write} x {HTTP/1, HTTP/2}, GATEWAY mode, single cycle (~10 min steady-state). Metrics reported from tenant 0 (representative).
7 of 8 configs are within 3% throughput. The 30t-c3-Write/HTTP2 config shows -5.6% within ATM-induced between-run variance (runs separated by ~4 hours, ATM may route to different frontend). The 30t-c5-Write/HTTP1 config shows +4.8% throughput with p99 reduced from 9.29ms to 7.72ms (-17%).
Conclusion
ss -i); hashmap-alloc has 4x lower throughput CV (1.8% vs 7.2%)HashMap$ValueIteratoreliminated;HashMap$Node-23%,DefaultHeaders$HeaderEntry-35% at c128